Offline recognition of handwritten chinese characters using Gabor features, CDHMM modeling and MCE training
Identifieur interne : 001970 ( Main/Exploration ); précédent : 001969; suivant : 001971Offline recognition of handwritten chinese characters using Gabor features, CDHMM modeling and MCE training
Auteurs : YONG GE [République populaire de Chine] ; QIANG HUO [Hong Kong] ; Zhi-Dan Feng [Hong Kong]Source :
- Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing [ 1520-6149 ] ; 2002.
Descripteurs français
- Pascal (Inist)
- Reconnaissance caractère manuscrit, Reconnaissance optique caractère, Chinois, Apprentissage, Modélisation, Filtre Gabor, Modèle Markov variable cachée, Reconnaissance forme, Evaluation performance, Précision élevée, Extraction caractéristique, Analyse discriminante, Traitement signal, Erreur classification minimale.
English descriptors
- KwdEn :
Abstract
We've been developing a Chinese OCR engine for handwritten Chinese scripts. Currently, our OCR engine supports a vocabulary of 4616 characters which include 4516 simplified Chinese characters in GB2312-80, 62 alphanumeric characters, 38 punctuation marks and symbols. By using 1,384,800 character samples to train our recognizer, an averaged character recognition accuracy of 96.34% is achieved on a testing set of 1,025,535 character samples. An arguably best Chinese OCR product on the market achieves an accuracy of 94.07% for the recognizable Chinese characters in the above testing set. In this paper, we describe key techniques used in our recognizer that contribute to the high recognition accuracy, namely the use of Gabor features and their spatial derivatives as raw features, the use of LDA for feature extraction and dimension reduction, the use of CDHMMs for modeling Chinese characters along both horizontal and vertical directions, and the use of minimum classification error as a criterion for model training.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000525
- to stream PascalFrancis, to step Curation: 000265
- to stream PascalFrancis, to step Checkpoint: 000602
- to stream Main, to step Merge: 001A55
- to stream Main, to step Curation: 001970
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Offline recognition of handwritten chinese characters using Gabor features, CDHMM modeling and MCE training</title>
<author><name sortKey="Yong Ge" sort="Yong Ge" uniqKey="Yong Ge" last="Yong Ge">YONG GE</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. of Electronic Engineering & Information Science, University of Science and Technology of China</s1>
<s2>Hefei, Anhui</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Hefei, Anhui</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Qiang Huo" sort="Qiang Huo" uniqKey="Qiang Huo" last="Qiang Huo">QIANG HUO</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</s1>
<s3>HKG</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Feng, Zhi Dan" sort="Feng, Zhi Dan" uniqKey="Feng Z" first="Zhi-Dan" last="Feng">Zhi-Dan Feng</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</s1>
<s3>HKG</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">04-0511183</idno>
<date when="2002">2002</date>
<idno type="stanalyst">PASCAL 04-0511183 INIST</idno>
<idno type="RBID">Pascal:04-0511183</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000525</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000265</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000602</idno>
<idno type="wicri:doubleKey">1520-6149:2002:Yong Ge:offline:recognition:of</idno>
<idno type="wicri:Area/Main/Merge">001A55</idno>
<idno type="wicri:Area/Main/Curation">001970</idno>
<idno type="wicri:Area/Main/Exploration">001970</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Offline recognition of handwritten chinese characters using Gabor features, CDHMM modeling and MCE training</title>
<author><name sortKey="Yong Ge" sort="Yong Ge" uniqKey="Yong Ge" last="Yong Ge">YONG GE</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. of Electronic Engineering & Information Science, University of Science and Technology of China</s1>
<s2>Hefei, Anhui</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Hefei, Anhui</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Qiang Huo" sort="Qiang Huo" uniqKey="Qiang Huo" last="Qiang Huo">QIANG HUO</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</s1>
<s3>HKG</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Feng, Zhi Dan" sort="Feng, Zhi Dan" uniqKey="Feng Z" first="Zhi-Dan" last="Feng">Zhi-Dan Feng</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</s1>
<s3>HKG</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Dept. of Computer Science & Information Systems, The University of Hong Kong, Pokfulam Road</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing</title>
<idno type="ISSN">1520-6149</idno>
<imprint><date when="2002">2002</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing</title>
<idno type="ISSN">1520-6149</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Chinese</term>
<term>Discriminant analysis</term>
<term>Feature extraction</term>
<term>Gabor filter</term>
<term>Handwritten character recognition</term>
<term>Hidden Markov models</term>
<term>High precision</term>
<term>Learning</term>
<term>Minimum classification error</term>
<term>Modeling</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Performance evaluation</term>
<term>Signal processing</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère manuscrit</term>
<term>Reconnaissance optique caractère</term>
<term>Chinois</term>
<term>Apprentissage</term>
<term>Modélisation</term>
<term>Filtre Gabor</term>
<term>Modèle Markov variable cachée</term>
<term>Reconnaissance forme</term>
<term>Evaluation performance</term>
<term>Précision élevée</term>
<term>Extraction caractéristique</term>
<term>Analyse discriminante</term>
<term>Traitement signal</term>
<term>Erreur classification minimale</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We've been developing a Chinese OCR engine for handwritten Chinese scripts. Currently, our OCR engine supports a vocabulary of 4616 characters which include 4516 simplified Chinese characters in GB2312-80, 62 alphanumeric characters, 38 punctuation marks and symbols. By using 1,384,800 character samples to train our recognizer, an averaged character recognition accuracy of 96.34% is achieved on a testing set of 1,025,535 character samples. An arguably best Chinese OCR product on the market achieves an accuracy of 94.07% for the recognizable Chinese characters in the above testing set. In this paper, we describe key techniques used in our recognizer that contribute to the high recognition accuracy, namely the use of Gabor features and their spatial derivatives as raw features, the use of LDA for feature extraction and dimension reduction, the use of CDHMMs for modeling Chinese characters along both horizontal and vertical directions, and the use of minimum classification error as a criterion for model training.</div>
</front>
</TEI>
<affiliations><list><country><li>Hong Kong</li>
<li>République populaire de Chine</li>
</country>
</list>
<tree><country name="République populaire de Chine"><noRegion><name sortKey="Yong Ge" sort="Yong Ge" uniqKey="Yong Ge" last="Yong Ge">YONG GE</name>
</noRegion>
</country>
<country name="Hong Kong"><noRegion><name sortKey="Qiang Huo" sort="Qiang Huo" uniqKey="Qiang Huo" last="Qiang Huo">QIANG HUO</name>
</noRegion>
<name sortKey="Feng, Zhi Dan" sort="Feng, Zhi Dan" uniqKey="Feng Z" first="Zhi-Dan" last="Feng">Zhi-Dan Feng</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001970 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001970 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:04-0511183 |texte= Offline recognition of handwritten chinese characters using Gabor features, CDHMM modeling and MCE training }}
This area was generated with Dilib version V0.6.32. |